Skip to main content

Get started with Machine Learning in Azure

Introduction

  • ML solutions forms the foundation of today's AI applications.
  • ML solutions support the latest technological advances in society by using existing data to produce new insights.
  • Data scientists make decisions to tackle ML problems in different ways.
  • These decisions affect the cost, speech, quality and longevity of the solution.
  • Following are the steps to design a ML solution with Azure that can be used in enterprise settings.

Diagram showing the six steps of the machine learning process.

  1. Defining the problem: Decide what would the model predict.
  2. Get the data
  3. Prepare the data: Clean and transform based on model's requirements.
  4. Train the model: Choose an algorithm and hyperparameter values based on trail and error.
  5. Integrate the model: Deploy the model, get the endpoint to generate predictions.
  6. Monitor the model: Track the model performance.
note

The diagram is a simplified representation of the machine learning process. Typically, the process is iterative and continuous. For example, when monitoring the model you may decide to go back and retrain the model.

Define the problem

  • The problem is defined by understanding:
    • What would model predict?
    • Type of ML task you use.
    • Criteria to make model successful.
  • Depending on the data and the expected output of the model, ML task can be identified.
  • Further, based on the task, the type of algorithm to be used to train the model, can be determined.
  • Following are common ML task.
    1. Classification: Predict a categorical value.
    2. Regression: Predict a numerical value.
    3. Time-series forecasting: Predict future numerical values based on time-series data.
    4. Computer vision: Classify images or detect objects in images.
    5. Natural language processing (NLP): Extract insights from text.
  • To train a model, there are a set of algorithms available depending on the task to be performed.
  • To evaluate the model, performance metrics can be calculated such as accuracy or precision.
  • Metrics also depends on the task to be performed that helps to decide if the model is successful.

Explore an example

  • Consider a scenario where you want to determine if the patient have diabetes.
  • In this case, the available data are other health metadata from other patients.
  • The output we want is categorical information, that is, the patient can either have diabetes or not have it.
  • Thus, the machine learning task is classification.
  • Following is a diagram showing one way to approach this problem.

Diagram showing the seven steps to train a model.

  1. Load data
  2. Preprocess data: Normalize and clean for consistency.
  3. Split data: Separate into training and test sets.
  4. Choose model: Select and configure an algorithm.
  5. Train model
  6. Score model: Generate predictions on test data.
  7. Evaluate: Calculate performance metrics.
  • Training a machine learning model is often an iterative process, where you go through each of these steps multiple times to find the best performing model.

Get and prepare data

  • Data is the foundation of ML.
  • Both, data quantity and quality affect the model's accuracy.
  • To train an ML model, you need to:
    • Identify data source and format.
    • Choose how to serve data.
    • Design a data ingestion solution.
  • To get and prepare data, you need to extract it from some source and make it available to Azure service that you want to use to train the model or make predictions.

Identify data source and format

Identify theExamples
Data sourceFor example, the data can be stored in a Customer Relationship Management (CRM) system, in a transactional database like an SQL database, or be generated by an Internet of Things (IoT) device.
Data formatYou need to understand the current format of the data, which can be tabular or structured data, semi-structured data or unstructured data.
  • Then, you need to decide what data you need to train your model, and in what format you want that data to be served to the model.

Design a data ingestion solution

  • You want to extract the data from some source, transform it and load it.
  • This process is also referred as Extract, Transform and Load (ETL).
  • To move and transform data, a data ingestion pipeline can be used.
  • The pipeline contains a sequence of tasks that moves and transforms the data.
  • You can choose to trigger the pipeline tasks manually or schedule it.
  • Such pipelines can be created using Azure Synapse Analytics, Azure Databricks, and also Azure Machine Learning.
  • A common approach for a data ingestion solution is to:
    1. Extract raw data from its source (like a CRM system or IoT device).
    2. Copy and transform the data with Azure Synapse Analytics.
    3. Store the prepared data in an Azure Blob Storage.
    4. Train the model with Azure Machine Learning.

Diagram showing an example of a data ingestion pipeline.

Explore an example

  • Consider training a weather forecasting model.
  • For this, you need to have a table that lists the temperature measurements of each minute.
  • To get the table, first you need to convert the unstructured data into a tabular format.
  • Then you need to get the average temperature for each minute.
  • For example, to create a dataset you can use to train the forecasting model, you can:
    • Extract data measurements as JSON objects from the IoT devices.
    • Convert the JSON objects to a table.
    • Transform the data to get the temperature per machine per minute.

Diagram showing an example of JSON data converted to a table.

Train the model

  • There are many services that can be used to train ML models.
IconDescription
Icon of Azure Machine Learning.Azure Machine Learning gives you many different options to train and manage your machine learning models. You can choose to work with the Studio for a UI-based experience, or manage your machine learning workloads with the Python SDK, or CLI for a code-first experience.
Icon of Azure Databricks.Azure Databricks is a data analytics platform that you can use for data engineering and data science. Azure Databricks uses distributed Spark compute to efficiently process your data. You can choose to train and manage models with Azure Databricks or by integrating Azure Databricks with other services such as Azure Machine Learning.
Icon of Microsoft Fabric.Microsoft Fabric is an integrated analytics platform designed to streamline data workflows between data analysts, data engineers, and data scientists. With Microsoft Fabric, you can prepare data, train a model, use the trained model to generate predictions, and visualize the data in Power BI reports.
Icon of Azure AI Services.Azure AI Services is a collection of prebuilt machine learning models you can use for common machine learning tasks such as object detection in images. The models are offered as an application programming interface (API), so you can easily integrate a model with your application. Some models can be customized with your own training data, saving time and resources to train a new model from scratch.

Features and capabilities of Azure Machine Learning

  • Azure ML is a cloud service that can be used to train, deploy and manage ML models.
  • It is designed to be used by data scientists, software engineers, devops professionals and others to manage the lifecycle of ML projects.
  • Azure ML supports the following tasks.
    • Exploring data and preparing it for modeling.
    • Training and evaluating machine learning models.
    • Registering and managing trained models.
    • Deploying trained models for use by applications and services.
    • Reviewing and applying responsible AI principles and practices.
  • Azure ML providess the following to support ML workloads.
    • Centralized storage of datasets for model training and evaluation.
    • On-demand compute resources.
    • Automated machine learning (AutoML), which makes it easy to run multiple training jobs with different algorithms and parameters to find the best model for your data.
    • Visual tools to define orchestrated pipelines for processes such as model training or inferencing.
    • Integration with common ML frameworks such as MLflow, which make it easier to manage model training, evaluation, and deployment at scale.
    • Built-in support for visualizing and evaluating metrics for responsible AI, including model explainability, fairness assessment, and others.

Use Azure Machine Learning studio

  • Azure ML studio is a web based portal that can be used for managing ML resources and jobs and consist of several other capabilities.
  • In Azure ML Studio, you can,
    • Import and explore data.
    • Create and use compute resources.
    • Run code in notebooks.
    • Use visual tools to create jobs and pipelines.
    • Use automated mahine learning to train models.
    • View details of trained models, including evaluation metrics, responsible AI information, and training parameters.
    • Deploy trained models for on-request and batch inferencing.
    • Import and manage models from a comprehensive model catalog.

Provisioning Azure Machine Learning resources

  • The only resource that needs to be created to use Azure ML is Azure ML workspace.
  • It can be created from Azure portal.
  • All the other supporting resources like storage accounts, container registries, virtual machines, etc are created automatically as needed.

Decide between computer options

  • To use Azure ML to train a model, you need to select computational resources required to perform the training process.
Compute optionsConsiderations
Central Processing Unit (CPU) or a Graphics Processing Unit (GPU)For smaller tabular datasets, a CPU is sufficient and cost-effective. For unstructured data like images or text, GPUs are more powerful and efficient. GPUs can also be used for larger tabular datasets, if CPU compute is proving to be insufficient.
General purpose or memory optimizedUse general purpose to have a balanced CPU-to-memory ratio, which is ideal for testing and development with smaller datasets. Use memory optimized to have a high memory-to-CPU ratio. Great for in-memory analytics, which is ideal when you have larger datasets or when you are working in notebooks.
  • Which compute fits best based on the needs is often a case of trail and error.
  • It a good practice to monitor the time taken and compute utilized to train a model.
  • By monitoring compute utilization, you know whether to scale the compute up or down.
  • For example, if the training process is taking very long even with the latest compute size, its better to use GPU instead of CPU.
  • Alternatively, you can choose to distribute model training by using Spark compute which require you to rewrite your training scripts.

Azure Automated Machine Learning

  • Azure ML's automated ML capabilities automatically assign compute.

  • It automates the time-consuming, iterative tasks of ML model development.

  • In Azure ML studio, Automated ML can be used to design and run the training experiments without needing to write code.

  • It provides step by step wizard to help run ML training jobs.

  • It can be used for many tasks like regression, classification, CV and NLP.

  • With AutoML, you have access to your own datasets and ML models can be deployed as services.

Integrate a model

  • You should plan how to integrate the model as it affects the way you train the model and training data you use.
  • To integrate the model, it needs to be deployed to an endpoint for either real-time or batch predictions.

Deploy a model to an endpoint

Get real-time predictions

  • As the name suggest, it gives the predictions in real time, as it receives the data.
  • For example, recommendation system.
  • When the user clicks on a product on the website, the model recommends other related product to the user immediately.
  • The model should be able to return the recommended product in the time it takes for the website to load the webpage.

Diagram showing a website of a web shop. A shirt is shown at the top and the recommendations, based on the shirt, are shown at the bottom.

Get batch predictions

  • If the model needs to predict the data and store it in a file or database, batch predictions can be used.
  • For example, sales prediction.
  • The model can be trained to predict the each future week's sales.
  • The prediction can be used to ensure that there is enough supply of the material to meet the demand.
  • It requires calling the model only once a week to get the next week's predictions.
  • A collection of data points is called batch.

Decide between real-time or batch deployment

  • Answering the following questions will help decide which deployment is required.
    • How often should predictions be generated?
    • How soon are the results needed?
    • Should predictions be generated individually or in batches?
    • How much compute power is needed to execute the model?

Identify the necessary frequency of scoring

  • Before generating the predictions, the first step is to collect the new data.
  • The data can be collection at different time intervals.
  • Generally there are two use cases:
    • The model is required to score new data as soon as it comes in.
    • There is a schedule or the model is triggered to score the new data that is collected overtime.

Diagram showing a visual representation of real-time and batch predictions.

  • Real-time or batch predictions doesnt necessarily depend on how often the data is collected.

Decide on the number of predictions

  • Another important factor is whether the prediction are required to be generated individually or in batches.
  • In simple words, whether the model should predict data for each customer individually or predict data for all the customer at once.

Consider the cost of compute

  • In addition to using compute while training a model, it is also required to deploy a model.

  • If real time predictions are required, the compute is expected to available and return the results almost immediately.

  • Container technologies like Azure Container Instance (ACI) and Azure Kubernetes Service (AKS) are ideal for such scenarios as they provide a lightweight infrastructure for your deployed model.

  • In such a scenario, once the model is deployed, the compute is always on.

  • Hence, you are continuously paying because you cannot stop since the model must to available all the time for predictions.

  • In case of batch predictions, you need compute that can handle large workloads.

  • Ideally using a compute cluster that can score the data in parallel batches by using multiple nodes.

  • In such a case, the compute is provisioned by the workspace when the batch is triggered and scaled down to 0 nodes when there is no new data to process.

  • This saves a significant cost.